ML Reviews

scoring rule

See:

T. Gneiting and A. E. Raftery. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378, 2007 for a review.

Using the formalism in [[scalable-uncertainties-from-deep-ensembles]]:

Ascribe a numerical score to a predictive distribution pθ(yx)p_\theta (y \vert x) that rewards calibrated predictions; higher is better.

Scoring function S(pθ,(y,x))S(p_\theta, (y, x)) that evaluates the quality of predictive distribution pθ(yx)p_\theta(y \vert x) relative to an event yxq(yx)y \vert x \sim q(y \vert x) with qq being the true distribution. i.e. assess how well distribution pθp_\theta encodes the true process qq, given data yxy \vert x is drawn from qq. SS is maximized when pθqp_\theta \rightarrow q.